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Abstract 

A number of regularization methods for discrete inverse problems consist in 
considering weighted versions of the usual least square solution. However, these 
so-called filter methods are generally restricted to monotonic transformations, e.g. 
the Tikhonov regularization or the spectral cut-off. In this paper, we point out that 
in several cases, non-monotonic sequences of filters are more efficient. We study a 
regularization method that naturally extends the spectral cut-off procedure to non- 
monotonic sequences and provide several oracle inequalities, showing the method to 
be nearly optimal under mild assumptions. Then, we extend the method to inverse 
problems with noisy operator and provide efficiency results in a newly introduced 
conditional framework. 
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1 Introduction 

We are interested in recovering an unobservable signal Xq, based on noisy observations 
of the image of xq through a linear operator A. The observation y satisfies the following 
relation 

y(t) = Ax (t)+e(t), 

where e(.) is a random process representing the noise. This problem is studied in [5], [12], 
[13] and in many applied fields such as medical imaging in [18j or seismography in [19] for 
instance. When the measured signal is only available at a finite number of points t±, t n , 
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the operator A must be replaced by a discrete version A n : x h-> (Ax(ti), Ar(t n ))', 
leading to a discrete linear model 

y = A n x + e, 

with t/SR". Difficulties in estimating x occur when the problem is ill-posed, in the sense 
that small perturbations in the observations induce large changes in the solution. This is 
caused by an ill-conditioning of the operator A n , reflected by a fast decay of its spectral 
values hi. In such problems, the least square solution, although having a small bias, is 
generally inefficient due to a too large variance. Hence, regularization of the problem is 
required to improve the estimation. A large number of regularization methods are based 
on considering weighted versions of the least square estimator. The idea is to allocate low 
weights Aj, or filters, to the least square coefficients that are highly contaminated with 
noise, thus reducing the variance, at the cost of increasing the bias at the same time. 
The most famous filter-based method is arguably the one due to Tikhonov (see [20J), 
where a collection of filters is indirectly obtained via a minimization procedure with I 2 
penalization. Tikhonov filters are entirely determined by a parameter r that controls the 
balance between the minimization of the I 2 norm of the estimator and the residual. 

Another well spread filter method that will be given a particular interest in this paper, 
is the spectral cut-off discussed in [2], [9] and [IT] . One simply considers a truncated 
version of the least square solution, where all coefficients corresponding to arbitrarily 
small eigenvalues are removed. Thus, spectral cut-off is associated to binary filters Aj, 
equal to 1 if the corresponding eigenvalue 6, exceeds in absolute value a certain threshold 
r, and otherwise. 

A common feature of spectral cut-off and Tikhonov regularization is the predetermined 
nature of the filters Aj, defined in each fixed non- decreasing function /(r, .) of the 

eigenvalues b 2 , and where only the parameter r is allowed to depend on the observations. 
However, in many situations, non-monotonic sequences of filters may provide a more 
efficient estimation of xq. Actually, optimal values for Aj generally depend on both the 
noise level, which is determined by the eigenvalue bi, and the component, say Xj, of xq in 
the direction associated to 6j. A restriction to monotonic collections of filters turns out to 
be inefficient in situations where the coefficients Xj are uncorrelated to the spectral values 
bi of the operator A n . 

Regularization methods involving more general classes of filters have also been treated 
in the literature. In [5], the authors study a general procedure known as unbiased risk esti- 
mation, that applies to arbitrary classes of filters, dealing in particular with non-monotonic 
collections. However, their general framework concerning the class of estimators requires 
in return additional regularity assumptions which we intend to relax in this paper. We 
focus on a specific class of projection estimators that extends the spectral cut-off to non- 
monotonic collections of filters. Precisely, we consider the collection of unrestricted binary 
filters Aj G {0, 1}. The computation of the estimator relies on the choice of a proper set of 



2 



coefficients m C {1, ...,n}, which considerably increases the number of possibilities com- 
pared to the spectral cut-off procedure. We show this method to satisfy a non-asymptotic 
exact oracle inequality, when the oracle is computed in the class of binary filters. More- 
over, we show our estimator to nearly achieve the rate of convergence of the best linear 
estimator in the maximal class of filters, i.e. when no restriction is made on Aj. 

It many actual situations, the operator A n is not known precisely and only an approx- 
imation of it is available. Regularization of inverse problems with approximate operator 
is studied in [6], [8] and [13]. In this paper, we tackle the problem of estimating xq in 
the situation where we observe independently a noisy version bi of each eigenvalue b{. We 
consider a new framework where the observations bi are made once and for all, and are 
seen as non-random. We provide a bound on the conditional risk of the estimator, given 
the values of fej, in the form of a conditional oracle inequality. 

The paper is organized as follows. We introduce the problem in Section [2J We define 
our estimator in Section [31 and provide two types of oracle inequalities. Section H] is 
devoted to an application of the method to inverse problems with noisy operators. The 
proofs of the results are postponed to the Appendix. 

2 Problem setting 

Let (X, ||.||) be a Hilbert space and A n : X — > IR n (n > 2) a linear operator. We want 
to recover an unknown signal Xq G X based on the indirect observations 



where e is a random noise vector. We assume that e is centered with covariance matrix 
a 2 I, where / denotes the identity matrix. We endow R n with the scalar product (u, v) n = 
n ~ 1 Y^=i u i v i an d the associated norm ||.|| n and we note A* : W 1 — Y X the adjoint of 
A n . Let )C n be the kernel of A n and /C^ its orthogonal in X which we assume to be of 
dimension n. The surjectivity of A n ensures that the observation y provides information 
in all directions. If this condition is not met, one may simply reduce the dimension of the 
image in order to make A n surjective. 

The efficiency of the estimator relies first of all on the accuracy of the discrete operator 
A n and how "close" it is to the true value A. The convergence of the estimator towards 
xq is subject to the condition that the distance of xq to the set K,^ tends to 0, which 
is reflected by a proper asymptotic behavior of the design ti,...,t n . This aspect is not 
discussed here, we consider a framework where we have no control over the design t\, t n 
and we focus on the convergence of the estimator towards the projection x'. 

Let {bf, <pi, "0i}i=i,...,n be a singular system for the linear operator A n , that is, A n (pi = 
biipi and A^ipi = bi4>i and b\ > ... > b\ > are the ordered non-zero eigenvalues of the 
self-adjoint operator A^A n . The 0j's (resp. ip^s) form an orthonormal system of K,^ (resp. 



y = A n x + e, 
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R n ). 

In this framework, the available information on xq consists in a noisy version of A n xo- 
As a result, estimating the part of xq lying in K, n is impossible, based only on the ob- 
servations. The best approximation of x$ one can get without prior information is the 
orthogonal projection of x onto fC^. This projection, noted x^ , is called best approximate 
solution and is obtained as the image of A n xo through the generalized Moore-Penrose 
inverse operator A^ n = {A* n A n )^ A* n , where (A^A n y denotes the inverse of A* n A n , restricted 
to /C^\ By construction, the generalized Moore-Penrose inverse A^ can also be defined as 
the operator for which {b^ 1 ; tpi, 0i}i=i,..., n is a singular system. We refer to [9] for further 
details. 

Searching for a solution in the subspace K,^ allows to reduce the number of regressors 
to n. Then, estimating x^ can be made using a classical linear regression framework where 
the number of regressors is equal to the dimension of the observation. Decomposing the 
observation in the singular basis {i^i}i=i > .„ >n leads to the following model 

Vi b{Xi -\- Ei } i 1, TL } 

where we set y, L = (y,ipi) n , = (s, ipi) n and Xi = (x o ,0i). It now suffices to divide each 
term by the known singular value b{ to observe the coefficient Xj, up to a noise term 
rji := bj Si. Equivalently, this is obtained by applying the Moore-Penrose inverse A^ 
in the model ([T]). We thus consider the function y^ = A\y e JC^, defined as the inverse 
image of y through A n with minimal norm. Identifying with the vector of its coefficients 
Vi = in tne basis {<f>i}i=i,...,n, w e obtain 

y\ = Xi + 7]i, i = l,...,n. (2) 

The covariance matrix of the noise rj = (rji, ...,i] n )' is diagonal in this model, as we have 
^(ViVj) = n^b^bj 1 ^ (4>i,4>j) n which is null for all i ^ j and equal to of := a 2 b^ 2 /n if 
i = j. Thus, the model can be interpreted as a linear regression model with heteroscedas- 
tic noises, the variances of being inversely proportional to the eigenvalues bf . In the case 
where e in the original model (|T]) is Gaussian with distribution A/"(0, <r 2 /), the noises rji 
remain Gaussian in ([2]). 

This representation points out the effect of the decay of the singular values 6j on the 
noise level, making the problem ill-posed. To control the noise with a too large variance 
of, a solution is to consider weighted versions of y^. For some filter A = (Aj, X n )', 
note x(X) e /C^ the function defined by (x(\),4>i) = \iy\ for i = l,...,n. Filter-based 
methods aim to cancel out the high frequency noises by allocating low weights to the 
components yj corresponding to small singular values. A widely used example is the 
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Tikhonov regularization, with weights of the form Aj = (1 + 7~of ) 1 for some r > 0. The 
Tikhonov solution can be expressed as the minimizer of the functional 



which makes the method particularly convenient in cases where the SVD of A* n A n or the 
coefficients y\ are not easily computable. We refer to [3] and [20] for further details. 

Another common filter-based method is the truncated singular value decomposition or 
spectral cut-off studied in [2], [9] and [11]. An estimator of xq is obtained as a truncated 
version of y\ where all coefficient y\ corresponding to arbitrarily small singular values are 
replaced by 0. This approach can be viewed as a principal component analysis, where only 
the highly explanatory directions are selected. The spectral cut-off estimator is associated 
to filter factors of the form Aj = t{i < k}, where 1{.} denotes the indicator function and 
k is a bandwidth to be determined. Data-driven methods for selecting suitable values of 
k are discussed in [3], @], p], [21] and [22]. 

A natural way to generalize the spectral cut-off procedure is to enlarge the class of 
estimators by considering non-ordered truncated versions of y\ as made in [TJ], [15] or 
[T6] (see also Examples 1 and 2 in [5]). This approach reduces to a model selection issue 
where each model is identified with a set of indices m C {1, n}. Precisely, for m a given 
model, define x m G tC^ as the orthogonal projection of y^ onto X m := span{0j,i G m}, 
that is, x m satisfies 



The objective is to find a model m that makes the expected risk E||x m — xo|| 2 small. The 
computation of the estimator no longer relies on the choice of one parameter k G {1, n} 
as for spectral cut-off, but on the choice of a set of indices m C {1, n}, which increases 
the number of possibilities. In particular, this approach allows non-monotonic collections 
of filters that may perform better than decreasing sequences obtained by spectral cut-off. 
To see this, write the bias- variance decomposition of the estimator x m for a deterministic 
model m: 



In these settings, it appears that in order to minimize the risk, best is to select indices 
i for which the component x\ is larger than the noise level of. A proper choice of filter 
should depend on both the variance of and the coefficient x\. Consequently, the resulting 
sequence {Ai}j = i ... in has no reason of being a decreasing function of of if some coefficients 
xj are large enough to compensate for a large variance. 



y — A n x\\ 2 + r||a;|| 2 , x G X, 
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3 Non-ordered variable selection 



3.1 Threshold regular izat ion 

The construction of the estimator by non-ordered variable selection reduces to finding 
a proper set m. Following the discrepancy principle, an optimal value for m (minimiz- 
ing the risk) is obtained by keeping small simultaneously the bias term Y2i<£ m x \ an d the 
variance term Yliem a i * n t ne expression of the risk E||rr m — Xo|| 2 . Following the previous 
argument, a minimizer of the risk E||x m — £o|| 2 is obtained by selecting only the indices 
i for which the coefficient x\ is larger than the noise level of. An optimal model is thus 
given by m* := {i : xj > of}. The coefficients Xi being unknown to the practitioner, the 
optimal set m* can not be computed in practical cases. For this reason it will be referred 
to as an oracle. 

We shall now provide a model m constructed from the available information, that 
mimics the oracle m*. Fixing a threshold on the coefficients Xi being impossible, we 
propose to use a threshold on the coefficients yj. Precisely, consider the set 



for {/ii}i=i ) ... )n a sequence of positive parameters to be chosen. Obviously, the behavior of 
the resulting estimator i„ relies on the choice of the sequence {/ii}i=i,..., n : the larger the 
/ij's, the more sparse is Xfh- It must be chosen so that the resulting set m contains only the 
indices i for which the noise level is small compared to the actual value of Xj. Although, 
the only knowledge of the observations yj and the variances of makes it a difficult task. 

There exist general filter-based methods that can be applied to arbitrary classes of 
filter estimators. One example is the unbiased risk estimation discussed in [5], which 
defines an estimator of xq via the minimization of an unbiased estimation of the risk, 
over an arbitrary set of filters. When restricted to the class of binary filters Aj G {0, 1}, 
unbiased risk estimation reduces to minimizing over M. the criterion 



The minimum can be shown to be reached for the set m = {i : yj > 2of }, which cor- 
responds to taking //j = 1/2 in our method. This choice is shown to be asymptotically 
efficient in Proposition 2 in |5J, although additional restrictions are made on the Aj's which 
we intend to relax here in an asymptotic framework. If these conditions are not met, the 
accuracy of the choice //j = 1/2 is not clear. We investigate in the next section a different 
choice for /ij which turns out to be nearly optimal in a general framework. 




m i y \\y^ - x m \\ 2 + 2^ of. 
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In a general point of view, the estimator be obtained via a minimization procedure, 

using a BIC-type criterion for heteroscedastic models, 



In a certain way, this can be seen as a hard-thresholding version of the estimator considered 
in [16] , obtained with a £ l penalty. However, expressing the estimator as the solution to a 
minimization equation does not ease the computation. The method requires in any case 
calculation of the SVD of A* n A n and the coefficients yj, which may be computationally 
expensive. On the other hand, the computation of the estimator is simple once the 
decomposition of y^ in the SVD of A* n A n is known, as it suffices to compare each coefficient 
y\ to the threshold 4of//j. 

3.2 Oracle inequalities 

In the definition of m, the choice of the parameters Hi is crucial. Too large values of 
Hi will result in an under-adjustment, keeping too few relevant components y\ to estimate 
xq- On the contrary, a small value of [i% increases the probability of selecting a component 
y\ that is highly affected with noise. Thus, it is essential to find a good balance between 
these two types of errors. In the next theorem, we provide a nearly optimal choice for the 
parameters /i,, under the condition that e has finite exponential moments. 

For i — 1, ...,n, note ji := rjf/af = nef/a 2 . We make the following assumption. 

Al. There exist K, (5 > such that Vt > 0, Vi = 1, n, P(7, > t) < KerW . 

In a Gaussian model, the 7j's have \ 2 distribution with one degree of freedom. The 
condition Al holds for any > 2, taking K = a/1 — 2/f3. 

Theorem 3.1 Assume that the condition Al holds. Set ^ = max{/3 log(n 2 o~f), 0}, the 
estimator Xfh satisfies 



with K x = 12(3, K 2 = 2 + (31og ||st|| 2 and K 3 = 2K(3. 

Remark 1. This theorem establishes a non-asymptotic oracle inequality with exact 
constant. The residual term is similar to that in Corollary 1 in |14] . The fact that the 
term depends on n is not problematic here as it can in any case be bounded by the 



Xfh = argmm 




Mxff, - x j \\ 2 < Ellac, 
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norm of Xq. 



Remark 2. The method requires knowledge of the operator A n , the variance a 2 and 
the constant (3 in the condition Al. Note however that knowing the constant K is not 
necessary to build the estimator. 

Remark 3. The set rh contains all indices % for which of < 1/n 2 , as we have in this 
case Hi — 0. This suggests that the error caused by wrongfully selecting indices i for 
which the variance is smaller than 1/n 2 is negligible, regardless of the value of y\. 

Remark 4. In an asymptotic concern, the accuracy of the result stated in Theorem 
13.11 relies on the convergence rate of the residual term to zero, compared to the risk of 
the oracle. The residual term ^2 iem * o~ 2 is actually the variance term in the bias-variance 
decomposition of x m *, and therefore, it is bounded by the risk of the oracle. As a result, 
the estimator Xfh is shown to reach at least the same rate of convergence as the oracle, up 
to a logarithmic term, which warrants good adaptivity properties. The logarithmic term 
vanishes in the convergence rate if the bias term Y^ ja . m » x 2 dominates in the risk of the 
x m * . Precisely, the oracle inequality is asymptotically exact as soon as the residual term 
logn^ igm , of is negligible compared to the bias term Yligm* x 1- ^ n ^ ms case ; ft follows 
from Theorem 13.11 that 

Mxfh - x f || 2 = (1 + o(l)) E||x TO . - x ] \\ 2 . 

Of course, this condition is hard to verify in practice and assuming it is true reduces 
to make strong regularity assumptions on the asymptotic behavior of Xq and A n . In a 
non-asymptotic framework, the theorem warrants that the estimator x„ is close to the 
oracle as soon as the variance term J2ie m * a f ls sma U compared to the bias term Ylii&m* x 1 
in the bias-variance decomposition of the oracle. 

The estimator Xfh being built using binary filters \ G {0, 1}, it is natural to measure 
its efficiency by comparing its risk to that of the best linear estimator in this class. 
Nevertheless, we see in the next corollary that a similar oracle inequality holds if we 
consider the oracle in the maximal class of filters, that is, allowing the Aj's to take any 
real value. 

Corollary 3.2 Assume that the condition Al holds, the estimator Xfh of Theorem \3.1\ 
satisfies 

Mxfh-x^W 2 < K 4 \ogn inf E||x(A) -x ] \\ 2 + — , 

AeM" n 

for some constants K^K^ independent of n. 
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This result is a straightforward consequence of Lemma 15.21 in the Appendix, where it is 
shown that the oracle in the class of binary filters Aj £ {0, 1} achieves the same rate of 
convergence up to a factor 2, as the best filter estimator obtained with non-random values 
of A. The class of unrestricted binary filters leads to a simple solution while it induces a 
slight loss of efficiency compared to the maximal class. 

Interest of oracles lies in the fact that the best estimator in a given class will often 
reach the optimal rate of convergence. In many situations, comparing the risk of the 
estimator to that of an oracle might be sufficient to deduce optimality results, as well as 
adaptivity properties, as discussed in j3]. In the literature of inverse problems, rates of 
convergence of oracles are obtained under regularity conditions on the map Xq and the 
spectrum of A n . These conditions can be gathered into a single assumption, generally 
referred to as source condition, relating the behavior of x to the regularity of the operator 
A n (see for instance [2], [9] or [ID]). Another point of view widely adopted in the literature 
is the minimax approach (see [3]), aiming to determine the behavior the worst possible 
value of Xq in a given class of functions. Typically, the condition can be a polynomial 
decay of the coefficients Xj, which reduces to assuming that Xq lies in the unit ball in a 
proper Besov space. For rates of convergence with a minimax approach, we refer to pQ, 
[7] and [UJ. In our framework, rates of convergence for be deduced from Theorem 

2 in [H], under a polynomial decay of the coefficients Xi and the eigenvalues \. 

4 Regularization with unknown operator 

In many actual situations, the operator A n is not precisely known. In this section, we 
consider the framework where the operator A n is observed independently from y. This 
situation is treated in [6], (8] or [13]. The method discussed in the previous section does 
not apply for such problems since it requires complete knowledge of the operator A n . 
As in [6], we assume that the eigenvectors <pi and ipi are known. This seemingly strong 
assumption is actually met in many situations, for instance if the problem involves con- 
volution or differential operators which can be decomposed in Fourier basis (see also the 
examples in [3]). Thus, only the eigenvalues hi are unknown and we assume they are 
observed independently of y, with a centered noise & with known variance s 2 > 0: 

k =bi + £i, i = l,...,n. 

The method discussed in this paper is different according to whether the eigenvalues are 
known exactly or observed with a noise. Thus, we need to assume here that s is positive 
and the known operator framework can not be seen as a particular case. Moreover, we 
assume the £j's are independent and satisfy the two following conditions. 
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A2. There exist K',/3' > such that Vt > 0,Vi = 1, ...,n, P(£f/s 2 > t) < K'e'^ . 

A3. There exist C, a > such that Vz = 1, ...,n, min{P(^ < — as),P(£j > as)} > C. 

As discussed previously, the condition A2 means that that the £;'s have finite exponential 
moments. The condition A3 is hardly restrictive, and is fulfilled for instance as soon as 
the £j's are identically distributed. As we shall see in the sequel, the method requires 
knowledge of the constant a (or at least an upper bound for it), but no information on 
the constants /3', K' or C is needed to build the estimator. 

Knowing the eigenvectors of A* n A n allows us to write the model in the form 

Hi b{Xi -\- Ej, z 1, n. 

In our framework where the actual eigenvalues fej are unknown, a natural estimator of 
each component X{ is obtained by fji = b^ Ui, provided that b\ ^ 0. However, it is clear 
that this estimate is not satisfactory if foj is far from the true value (consider for instance 
the extreme case where 6, = or if 6, and bi are of opposite signs). Actually, the naive 
estimator b~ x can not be used efficiently to estimate b~ l because it may have an infinite 
variance. In j6], the authors fix a threshold w the estimate can not exceed and consider 
an estimator of b^ 1 equal to 6" 1 if \bi\ > 1/w and null otherwise. As we will see below, 
we use the same idea here, although the threshold fixed on the b^s is implicitly part of 
the variable selection process. 

We can reasonably assume that null values of 6j do not provide any relevant informa- 
tion and can not be used to estimate xq. Thus, to avoid considering trivial situations, 
we assume that all bi are non-zero. In all generality, the jjiS can be viewed as noisy 
observations of X{ by writing 

Vi=Xi + fji, i = 1, ...,n, 

with jji = b^ 1 (y,ipi) n and fji = b^fa - ^Xi), where we recall = (e,ipi) n . As in the 
previous section, we propose a threshold procedure to filter out the observations fji that 
are potentially highly contaminated with noise. Here, the noise fji is more difficult to deal 
with because it depends on the unknown coefficient Xj. 

Our objective is to find an optimal variable selection criterion conditionally to the foj's. 
In order to do so, we consider a framework where the 6j's are observed once and for all, 
and are treated as non-random. Thus, we define as an oracle, a model m| minimizing 
the conditional risk Eg||£ m — x^|| 2 , where Eg(.) denotes the expectation knowing £ = 
(£i, ...,£„)'. Following a similar argument as in the previous section, a model minimizing 
the conditional risk contains only the indices i for which the coefficient x\ is larger than 
the noise level. Hence, we may define m| = {i : x\ > K^(fjf)}. A notable difference here 
is that the noise fji actually depends on the value Xi. Let of = n^b^a 2 , we can calculate 
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the conditional expectation of fjj, given by E^(fj 2 ) = of + b^Qx 2 . After simplifications, 
it appears that the optimal model conditionally to the £j's can be expressed in the two 
following explicit forms 



a 2 „ ,1 I . 2 



m* = <j i : 2|6i| > + N f = i 1 : x * > 




In the first expression, we see that the oracle selects indices i for which the observation 
hi exceeds a certain value depending on both Xj and 6j. Interestingly, components ?/j 
corresponding to observations 6j smaller than half the true eigenvalue bi are not selected 
in the oracle, regardless of the coefficient Xj. Here again, the optimal model m| can not 
be used in practical cases since it involves the unknown values Xi and £j. We can only try 
to mimic the optimal threshold, based on the observations jji and 6j. Consider the set 



m£ — |i : y i > 8&1 \bi\ > as 

where {vi\i=x,..., n are parameters to be chosen and a is the constant defined in A3. With 
this definition, only the indices for which the observation 6, is larger than a certain value, 
namely as, are selected. This conveys the idea discussed in [6J, that when bi is small com- 
pared to the noise level, the observation bi is potentially mainly noise. Remark however 
that in [B], the lower limit for the observed eigenvalues is s log 2 (l/s), while in our method, 
it is chosen of the same order as the standard deviation s. 



Define the set M = {i : < 2as}. 

Theorem 4.1 Assume that the condition Al holds. The threshold estimator obtained 
with Vi = max{/3 log(n 2 <j 2 ), 0} satisfies, 

E^x^-x^W 2 < (K[logn + K' 2 )E^\\x mt - x^\\ 2 + J2 x l + <0, 
with = max{18/3,4ct- 2 /3'}, K' 2 = max{9(/31og |ja; t || 2 + 1), 1}, and 

<0 = — + 4 £ ^|l{C 2 > s 2 p>logn}. 

n a z s z 

Moreover, if A2 holds, E(/c(£)) = 0(n _1 logn). 

The main interest of this result lies in the fact that it provides an oracle inequality, 
conditionally to the Vs. In particular, the conditional oracle x m * is more efficient than 
the estimator obtained by minimizing the expected risk m h-> E||x m — x^|| 2 , since the 
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optimal set m| is allowed to depend on the £j's. We see that the estimator performs 
almost as well as the conditional oracle. Indeed, the residual term k(£) is independent 
from £ with high probability, and its expectation is negligible under A2 as pointed out in 
the theorem. The non-random term Y2i^M x i * s sman if the eigenvalues hi are observed 
with a good precision, i.e. if the variance s 2 is small. Moreover, this term can be shown 
to be of the same order as the risk under the condition A3. 

Corollary 4.2 If the conditions Al, A2 and A3 hold, the threshold estimator defined in 
Theorem \4-l\ satisfies 

Elisor - x ] \\ 2 < K' 4 \ogn E\\x m; - x^\\ 2 + K * logn 

for some constants K' A and K' 5 independent from n and s 2 . 

With a noisy operator, we manage to provide an estimator that achieves the rate of 
convergence of the conditional oracle, regardless of the precision of the approximation of 
the spectrum of A n . Indeed, the constants K' 4 and K' 5 in Corollary 14.21 do not involve the 
variance s 2 of £. Actually, the variance only plays a role in the accuracy of the oracle. 
The result is non- asymptotic and requires no assumption on s 2 . 



5 Appendix 

5.1 Technical lemmas 

Lemma 5.1 Assume the condition Al holds. We have 

• E ((77? - x 2 )t{i G m}) < 2Kf3a 2 e-^/f 3 . 
. E((x 2 -r ] 2 W^m})<a 2 (6^ l + 2). 

Proof Using the inequality (a + b) 2 < 2a 2 + 2b 2 , we find that r\ 2 — x 2 < 2r\ 2 — yj 2 /2. By 
definition of in, we get 

(rjf - x 2 )t{i em}< 2a 2 ( % - ^)l{z G m} < 2a 2 ( lt - fi t )t{ % > m}, 

where we used that X < X1{X > 0}. We finally obtain for all i m* , 

POO 

E ((ri 2 - x 2 )t{i G m}) < 2a 2 / P( 7i >t + ^)dt< 2K(5a 2 e-^ /p , 

Jo 

as a consequence of Al. For the second part of the lemma, write x 2 — nf = yj 2 — 2r\iy\ 
which is bounded by 3y} 2 /2 + 2rjf, using the inequality 2ab < 2a 2 + b 2 /2. This leads to 

E{{x 2 -r, 2 )t{iim}) <a 2 (Q^ + 2). 
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Lemma 5.2 

inf E\\x m - x^\\ 2 < 2 inf E||x(A) - x ] \\ 2 . 

meM ASM" 

Proof. The minimal values of the expected risks can be calculated explicitly in the two 
classes considered here. Minimizing over M. n the function A h- >■ E||x(A) — a^|| 2 , we find that 
the optimal value of Aj is reached for A* = x 2 /(x 2 + a 2 ). On the other hand, we know 
that to i— > E||x m — x^\\ 2 reaches its minimum at m* = {i : x 2 > a 2 }, yielding 

n 2 2 

inf E\\x(X) -x^\\ 2 = V fi^ 9 and inf E||x m - ^ll 2 = V a 2 + V x 2 . 

i=l 1 s iGm* iem* 

By definition, if % G to*, 2x 2 /(x 2 + of) > 1. In the same way, 2o 2 /(af + of) > 1, for all 
% ^ to*. We conclude by summing all the terms. 



Lemma 5.3 Assume the condition Al holds. We have, for all % — 1, ...,n, 
. E 5 ((^ 2 - x 2 )l{t G toJ) < o 2 e-^ + ^M. 
• E c ((x 2 - fi 2 )t{i i toJ) < 9o 2 z/i + 8E^ 2 ) + x 2 t{\k\ < as}. 

Proof. Remark that fj 2 = 6 l " 2 (e i - ^Xi) 2 < 2br 2 e 2 + ib^Qx 2 . Using that x 2 > y 2 /2 - f) 2 , 
we deduce _ 2 

~2 2 ^ /it— 2 2 i 2 % 

Vi~Xi< 46- +46- - y. 

Writing = {y 2 > 8o 2 z/j} fl {|6j| > as}, we find 

(r? 2 - x 2 )t{i G toJ < 4of ( 7l - i/i)l{ 7 < > ^} + 46- 2 e 2 x 2 l{|6 i | > as}, 

where we recall that 7 j = nej/a 2 . Clearly, 6~ 2 l{|6j| > as} < a~ 2 s~ 2 and the result follows 
using the condition Al. For the second part of the lemma, remark that the complement of 
fh^ is {y 2 < 8<3fz/j, > as}U{|6i| < as}. Using the inequality xf—fj 2 < (l + 9~ 1 )y 2 + 9fj 2 
for 9 = 8, we get 

(x 2 - fj 2 )t{i i toJ < 9of z/, + 8fj 2 + x 2 t{\k\ < as}. 
Lemma 5.4 // A2 holds, we have 

i 2 <s 2 p'\ogn + i 2 t{S 2 >s 2 p'\ogn} ) 
with E(£ 2 1{£ 2 > s 2 /3'logn}) = Ofa-Uogra). 
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Proof. Write Q < s 2 /3'logn 1{£ 2 < s 2 (3' \ogn} + Qt{Q > s 2 (3'logn}. To bound the first 
term, we use the crude inequality 1{£ 2 < s 2 (3'logn} < 1. For the second term, we have 
as a consequence of A2, 

POO 

E (£ 2 1{£ 2 > s 2 p hgn}) = / P (C 2 1{C7* 2 > P' log™} > t) dt 

Jo 

POO 

= s 2 f3' log n F(tf/s 2 > 13' log n) + s 2 / /s 2 > t) dt 

J/3 1 logn 

K'(3's 2 {l + \ogn) 

~ n 

5.2 Proofs 

Proof of Theorem 13.11 Write 

\\xfh - Xq\\ 2 = \\x m * - x \\ 2 + (?? 2 - x 2 )l{z G m} + ^ (x 2 - ?? 2 )l{i ^ to}. 

The objective is to bound the terms E((^ 2 — x 2 )l{i G m}) and E((x 2 — r] 2 )l{z ^ m}). 
First, assume that cr 2 > 1/n 2 , i.e. /ij = Plog (n 2 af). By Lemma [5.11 we know that 

E ((r? 2 - x 2 )t{i G m}) < 2K^e-^ < 

The same bound holds if a 2 < 1/n 2 with yn = 0, as a straight-forward consequence of 
Lemma 15.11 On the other hand, note that if i ^ to, then = /31og(n 2 cr 2 ). Lemma 15.11 
warrants 

E ((a: 2 - n 2 )t{i i to}) < a 2 (6/3 log(n 2 a 2 ) + 2) . 
Since i G m*, log(n 2 cr 2 ) < 2 logn + log ||a^|| 2 . We conclude by summing all the terms. 

Proof of Theorem 14.11 The proof starts as in Theorem 13.11 We have 



| II ' I I ; <i ! ■' 



and the objective is to bound the conditional expectation of each term separately. Using 
successively Lemma [5.31 and Lemma [5.41 we get 

rr cr 



with 

/I ZVfl /I £2, 

» n r^2 ^ „2 o/ 

n* crs 



^(0 = ^ + ^|l{e 2 >^'logn}. 
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By Lemma we know that «(£) = J2i<£ m * * s sucn that 

< 4(ir/3 + 2a- 2 ^||a;tl| 2 logn) = Q flogn 



n \ n 



On the other hand, Lemma [5.31 gives, for 9 = 8, 

E 5 ((xj - 77 2 )1{^ i m e }) < 9^ + 8E 5 (r/ 2 ) + ^{ft < as}. 

For all z E m|, we know that > |^|/2. Thus, if z e m^, 1{|S<| < as} < t{i E M}, 
where we recall M = {i : \b{\ < 2as}. We know also that, if % E m|, then aj < xj. Thus, 
Vi = /31og(n 2 <r 2 ) < 2/31ogn + /31og \\x^\\ 2 . Noticing that <3f < E^(fjj), we find 

E ? ((xj - fj*)l{i £ mj) < (18/3 log n + 9/3 log ||ar + || 2 + 8)E 5 (r? 2 ) + x 2 l{i G M}. 

The result follows by summing all the term, using that the risk of the oracle x^ is 

E 5 ||x m *-xt|| 2 = J2x 2 i + J2 E ^i)- 

Proof of Corollary 14.21 It suffices to show that the term ^2 ieM xj is of the same order 
as the risk of the oracle. Write 

n n 

E||x m » -x f || 2 > ^xj¥(i i ml) > ^x 2 P(|6i| < \h\/2). 
i=i i=i 

For all i E M, the probability P(|Si| < |&i|/2) is greater than C as a consequence of A3. 
We deduce ^2 i&M xj < C~ 1 'E\\x m * — x^\\ 2 . 
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